A Large-Scale Fault-Tolerant Distributed Software-Build Process
نویسنده
چکیده
A large software system can be compiled and linked more quickly if its build process is distributed across a network of multiple computers. However, large networks are more likely to contain a computer that causes a build to fail. If such a computer can be identified, it can be excluded from participation. Otherwise, if the failed command can be detected, it can be retried on a different computer. We describe our experiences designing, implementing, and maintaining a fault-tolerant distributed build process for an industrial software-development environment. We focus on techniques that augment the capabilities of available distributed-build tools. Our build process produces Hewlett-Packard’s laser printer firmware. Our environment includes hundreds of engineers, about one thousand computers, and about two million lines of code. As an example of the speedup provided by distribution, a forced sequential rebuild of all targets requiring about 155 minutes can be accomplished concurrently in about 35 minutes.
منابع مشابه
An Agreement Service for Implementing Fault Tolerant Distributed Software
Distributed systems includes a large number of processors which increases the risk of failures. Fault tolerance is of a key importance in such systems. Implementing fault tolerant distributed software (FTDS) is a di cult task [2]. Group communication services [8] such as group membership and reliable multicast has been proposed to solve some of the problems in implementing FTDS. In this paper w...
متن کاملSomersault: Enabling Fault-Tolerant Distributed Software Systems
fault-tolerant, CORBA, process replication, process mirroring, high availability Somersault is a platform for developing distributed fault-tolerant software components and integrating these critical components with other components into distributed system solutions. Critical application processes are mirrored across a network, with each critical process being replicated in a primary and seconda...
متن کاملA New Proactive Fault Tolerant Approach for Scheduling in Computational Grid
Grid Computing provides non-trivial services to users and aggregates the power of widely distributed resources. Computational grids solve large scale scientific problems using distributed heterogeneous resources. The Grid Scheduler must select proper resources for executing the tasks with less response time and without missing the deadline. There are various reasons such as network failure, ove...
متن کاملBio-inspired Fault Tolerant and Adaptive System Modeling and Simulation on the Grid
Grid computing, which is characterized as large-scale distributed resources sharing and cooperation, is becoming a mainstream technology in distributed computing. In this paper, we present the idea of applying grid-computing technology to model and simulate large-scale and high-performance bioinspired fault tolerant and adaptable control system. Gridbased workflow management service is employed...
متن کاملSomersault Software Fault-Tolerance
software fault-tolerance, process replication failure masking, continuous availability, topology The ambition of fault-tolerant systems is to provide application transparent fault-tolerance at the same performance as a non-fault-tolerant system. Somersault is a library for developing distributed fault-tolerant software systems that comes close to achieving both goals. We describe Somersault and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005